Needs Assessment for Scientific Visualization of Multivariate, High-Dimensional Microarray Data
نویسندگان
چکیده
The availability of genomic data is increasing exponentially. This is magnified by the proliferation of data at all “omic” hierarchy levels. This explosive growth in biological data (currently GenBank contains over 44 billion base pairs and over 40 million sequences) mandates an increasing need for sophisticated mathematical and computational methods and software environments capable of handling the complexities and sizes of these various “omic” datasets. In particular, this is also true for microarray data. Microarray technology allows for the simultaneous genomic analysis of entire organismal genomes. The resulting datasets are high-dimensional, complex and frequently difficult to interpret. In order to address these microarray dataset and software needs, we have first decided to examine the need for and the subsequent design of advanced microarray data analysis software tools that will allow researchers to use new means and methods of visualizing and analyzing their microarray data. Towards this goal, a survey research instrument entitled “Needs Assessment for Scientific Visualization of Microarray Data” was created and distributed (n = 500). The survey was submitted to and approved by Virginia Commonwealth University’s Institutional Review Board (VCU IRB#5065). The survey research instrument was distributed to a non-random sample set of researchers and biomedical life scientists currently using microarray methods in their day-to-day research. The results of the survey will be statistically analyzed and are anticipated to be instrumental in identifying a set of algorithmic/software needs to be added to the currently available microarray software analysis toolsets.
منابع مشابه
A framework for the visualization of multidimensional and multivariate data
High dimensionality is a major challenge for data visualization. Parameter optimization problems require an understanding of the behaviour of an objective function in an n-dimensional space around the optimum this is multidimensional visualization and is a natural extension of the traditional domain of scientific visualization. Large numeric data tables with observations of many attributes requ...
متن کاملFeature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملAn Integrated Environment for High-dimensional Geographic Data Mining
Introduction Geographic data are often very large in volume and “characterized by a high number of attributes or dimensions” [1]. There are urgent needs to develop effective and yet efficient approaches for analyzing such voluminous and high-dimensional data to address complex geographic problems [1, 2, 3, 4], e.g., detecting unknown multivariate patterns or relationships between socioeconomic,...
متن کاملAVA: visual analysis of gene expression microarray data
SUMMARY AVA (Array Visual Analyzer) is a Java program that provides a graphical environment for visualization and analysis of gene expression microarray data. Together with its interactive visualization tools and a variety of built-in data analysis and filtration methods, AVA effectively integrates microarray data normalization, quality assessment, and data mining into one application. AVAILA...
متن کاملSystematic Methods for Multivariate Data Visualization and Numerical Assessment of Class Separability and Overlap in Automated Visual Industrial Quality Control
The focus of this work is on systematic methods for the visualization and quality assessment with regard to classification of multivariate data sets. Our novel methods and criteria give in visual and numerical form rapid insight in the principal data distribution, the degree of compactness and overlap of class regions and class separability, as well as information to identify outliers in the da...
متن کامل